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Abstract. 

CN) In this paper we elaborate on earlier work by the same authors in which a novel 

^ Bayesian inference framework for testing the strong-field dynamics of General Relativity using 

\l coalescing compact binaries was proposed. Unlike methods that were used previously, our 

technique addresses the question whether one or more 'testing coefBcients' {e.g. in the phase) 
parameterizing deviations from GR are non-zero, rather than all of them differing from zero 
j at the same time. The framework is well-adapted to a scenario where most sources have low 

signal-to-noise ratio, and information from multiple sources as seen in multiple detectors can 
readily be combined. In our previous work, we conjectured that this framework can detect 
generic deviations from GR that can in principle not be accomodated by our model waveforms, 
" , I on condition that the change in phase near frequencies where the detectors are the most sensitive 

^ is comparable to that induced by simple shifts in the lower-order phase coefflcients of more than 

JJ"! a few percent (~ 5 radians at 150 Hz). To further support this claim, we perform additional 

rN numerical experiments in Gaussian and stationary noise according to the expected Advanced 

LIGO/Virgo noise curves, and coherently injecting signals into the network whose phasing 
differs structurally from the predictions of GR, but with the magnitude of the deviation still 
being small. We find that even then, a violation of GR can be established with good confidence. 



1. Introduction 

The theory of General Relativity (GR) is currently the most commonly accepted theory 
describing space-time and gravitation. The theory has been accurately tested within the weak- 
field, stationary regime {e.g. through Solar System tests [T|), and no deviations from GR have 
been conclusively found. However, by the very nature of GR, gravitation is a dynamical 
phenomenon, a key aspect being the prediction of gravitational waves (GW) The first clue 

towards this dynamical aspect came with the discovery of the Hulse- Taylor binary pulsar [Ij. Its 
orbital motion is in agreement with the assumption that GWs carry away energy and angular 



momentum as predicted by GR. This test, as well as similar test performed on other binary 
pulsars that were subsequently discovered [3 E] , however still involve indirect observations of 
GWs. Furthermore, even for the most relativistic binary pulsar that is currently known (PSR 
J0737-3039), one has GM/{c^R) ~ 4.4 x lO-*^ (where M is the total mass, and R is the orbital 
separation) and a typical orbital speed of f /c ~ 2 x 10^'^. The observed binary pulsars are still 
very far from merger. However, given access to a sufficiently large volume of space, one should 
find compact binaries in the final stages of inspiral. At the nominal last stable orbit, these will 
have a separation oi R = GGM/c^, GM/^c^R) = 1/6 and v/c = l/\/6- The process of inspiral 
has been modelled in detail in the context of the so-called post-Newtonian (PN) approximation 
(c/. [7] and references therein). The direct detection of the gravitational wave signals from such 
binaries would enable us to probe the genuinely strong-field, dissipative dynamics of GR. 

Compact binary systems that are close to merger are in fact among the primary targets for 
kilometer-sized interferometric detectors. These include the Virgo detector in Italy [8l|9], the 
two LIGO detectors in the United States [10] and GEO600 in Germany [TT]. Both Virgo and 
LIGO are in the process of being upgraded to Advanced Virgo and Advanced LIGO, which are 
expected to be completed around 2015 [IS [13l Ull [15] . Furthermore, an interferometer named 
KAGRA (formerly know as LCGT) in Japan is in a planning stage [161, and a detector in India 
[17 is also being considered. With the current estimates for the advanced detectors, the rate of 
detection of inspiralling compact binaries is expected be around a few tens per year |18j . 

Quite a number of alternative theories to GR have been discussed in the literature, and the 
accuracy with which some of these could be probed with gravitational waves has been studied 
within the Fisher matrix formalism; for scalar-tensor theories, see [19^ [20l [2T] [22] [23] I24j . for 
a varying Newton constant [25], for modified dispersion relation theories (commonly referred 
to as 'massive gravity') [231, El EH [271 [28l [291 [3^, for violations of the No Hair Theorem 
[3T] [32l l33] [35] , for violations of Cosmic Censorship [351 ES] , and for parity violating theories 
[37] [381 [39] HO]. Practical Bayesian methods for performing tests of GR on actual gravitational 
wave data when they become available include the work by Del Pozzo et al. [H] in the context 
of massive gravitons, that of Cornish et al. [42| . which employed the so-called parameterized 
post-Einsteinian (ppE) waveform family [MlHl]) and that of Gossan et al. ^45j which focussed 
on the ringdown signal. 

A proposal by Arun et al. [461 Wl\ HH] is to measure various quantities within the phase and 
to check their consistency with the predictions of GR. This could lead to a very generic test, in 
that one would not be looking for particular (classes of) alternative theories. However, so far 
its viability was only explored through Fisher matrix studies. 

Inspired by the method of Arun et al, the authors of the present paper developed a new 
Bayesian framework, with the following features [49] : 

• Contrary to previous Bayesian treatments such as [121 HS] , it addresses the question "Do one 
or more testing parameters characterizing deviations from GR differ from zero?" as opposed 
to "Do all of them differ from zero?" In practice this comes down to testing a number of 
auxiliary hypotheses, in each of which only a subset of the set of testing parameters is 
allowed to be non-zero; 

• Precisely because in most of the auxiliary hypotheses a smaller set of testing parameters is 
used, this method will be more suited to a scenario where most sources have low signal-to- 
noise ratio, as we expect to be the case with Advanced LIGO/Virgo; 

• As with most Bayesian methods, information from multiple sources can easily be combined; 

• The framework is not tied to any particular waveform family or even any particular part 
of the coalescence process. However, in [l9] we focussed on the inspiral part and chose the 
testing parameters to be shifts in the lower-order inspiral phase coefficients, as we will also 
do here. 



Besides establishing a theoretical framework, [59] also showed results for a few simple example 
deviations from GR. In particular, it was illustrated how the method can be sensitive to 
deviations which in principle cannot be accomodated by the model waveforms. In fact, it is 
reasonable to assume that the technique will be able to pick up generic deviations from GR, 
on condition that their effect on the phasing is of the same magnitude as that of a simple shift 
in one or more of the low-order phasing coefficients of the standard post-Newtonian waveform. 
More precisely, as long as the change in the phase at frequencies where the detectors are the 
most sensitive is comparable to the effect of a shift of a few percent at (v/c)^ beyond leading 
order (corresponding to ~ 5 radians at / = 150 Hz), we expect the deviation from GR to be 
detectable by our method. In this paper, we will show some striking examples to provide further 
support for this claim. 

Subsequent sections of this paper are structured as follows. In section [2] we recall the 
theory and the implementation of the method introduced in [l9]. Section [3^ shows results 
from simulations done with some specific examples of modifications to the waveform phase. 
A discussion and conclusions are presented in section |4j 

2. Method 

2.1. Bayesian Inference 

At the heart of the method we proposed in |3S] lies the question "to what degree do we believe 
GR is the correct theory describing the detected signals?" This question is best answered 
within the framework of Bayesian model selection [5D] . The cornerstone of Bayesian analysis is 
the comparison between the probabilities of two hypotheses given the available data. This is 
quantified by the odds ratio 

,_ Pini\d,i) 
' p{n,\d,iy 

where T-ii^T-Lj are the hypotheses of the models to be compared, d represents the data and I is 
the relevant background information. Using Bayes' theorem, we can then write this odds ratio 
as 

p{d\n,,\) pmi) ^ 

p{d\n,,i)p{n,\i) ^P{n,\iy 

The odds ratio is thus the product of two ingredients. The first factor is the ratio of the so-called 
evidences., Bj = , which is also known as the Bayes factor. The evidence (also called 

the marginal likelihood) for e.g. the hypothesis Tii is given by 

p{d\ni,i) = J d9pie\ni,i)p{d\e,ni,i) (3) 

where 9 are the parameters associated with the hypothesis Hi, and p(9\'Hi,l) is the prior 
probability distribution of the parameters. 

The second factor in Eq. (2 ) is the ratio of the prior probabilities, p^^^'jj^ , and is often referred 
to as prior odds. It should be noted that the prior probability distribution is uniquely determined 
by the prior information. 



The assignment of the prior probability will be further explained in subsection 2.3, Details 
on the calculation of the Bayes factor can be found in subsection |2.6[ 



2.2. Waveform model 

Before moving on to define the odds ratio for the problem at hand, let us explain the model 
waveforms used in this paper. In the inspiral regime of compact binary coalescing systems, the 
waveforms are accurately described by the post-Newtonian approximation. This approximation 



describes important quantities such as the energy and the flux as expansions in powers of f/c, 
where v is the characteristic velocity of the binary system. 

To illustrate our method we will use the analytic, frequency domain TaylorF2 waveform 
model [541 155j . which is implemented in the LIGO Algorithms Library in the following way [56j : 

where D is the distance, {6, (p) the sky position in the detector frame, and (i, ip) the orientation 
of the orbital plane with respect to the direction to the line of sight. Ai is the so-called chirp 
mass, and r] is the symmetric mass ratio; in terms of the component masses (mi, 7712) one has 
rj = mim2/{mi + 771-2)^ and M = {mi + 7772) rf/^. The phase ^{M,r]; f) takes the form 

7 

^{M,7i; f) = 27: ft, - 0, - vr/4 + [i,, + V^f ^ In /] /(^-s)/^, (5) 

with tc and (j), the time and phase at coalescence, respectively. Central to the method are the 
phase coefficients ■01 and il)f \ These either have a functional dependence on {M.,ri) as predicted 
by GR (c/. [H] for the explicit expressions), or are allowed to deviate from the value predicted 
by GR. In Eq. Q, the 'frequency sweep' F{Ai,r]; f) is itself an expansion in powers of the 
frequency / with mass-dependent coefficients. Note that F is related to the phase ^ and we 
could in principle allow it to deviate from the GR prediction. However, for stellar mass binaries 
and with advanced detectors, we do not expect to be particularly sensitive to sub-dominant 
contributions to the amplitude [SB]) so we will keep the function F fixed to its GR expression. 

We note that in the case of binary neutron stars, which are the sources we will in fact focus 
on, TaylorF2 waveforms are likely to already suffice for a first test of GR. Indeed, in the relevant 
mass range, TaylorF2 has a match and fitting factor close to unity with Effective One-Body 
waveforms modified for optimal agreement with numerical simulations [57J. Spins are unlikely 
to be very important in this case. One might worry about finite size effects, but as shown in 
[58], even for the most extreme neutron star equations of state and for sources as close as 100 
Mpc, advanced detectors will not be sensitive to these at frequencies below ~ 400 Hz; hence one 
could cut off the recovery waveforms at 400 Hz, in which case the loss in SNR would be less than 
a percent. However, if one also wanted to test GR using systems composed of a neutron star 
and a black hole, or two black holes, then dynamical spins [51], sub-dominant signal harmonics 
|52|l36j and merger/ringdown [53j would become important, and in that case more sophisticated 
waveform models will be called for. The latter is currently the subject of investigation. 



2.3. Defining the odds ratio 

Focussing our attention on deviations from GR in the phase of the measured waveform, i.e. 
keeping the amplitude fixed to its GR-predicted value, we consider within the Bayesian model 
selection framework the following two hypotheses: 

• T^GR- The waveform has a phase with the functional dependence on {M,r]) as predicted 
by GR; 

• ^modGR: One or more of the phase coefficients do not have the functional dependence on 
{Ai,ri) as predicted by GR. 

The GR hypothesis, TicR, is the hypothesis that our GR waveform model (TaylorF2 in this 
case) correctly describes the signal originating from the inspiral of two compact objects. Ideally, 
^modGR would simply have been the negation of ^gr- However, a priori, deviations from GR 



can occur an infinite number of ways. What we will argue is that for the core question that we 
want to address, i.e. whether or not the observed phase deviates from GR, it will be sufficient 
to allow for a limited set of possible deviations in the recovery waveforms. For TaylorF2, we 
take the set of deviations only to be within the known phase coefficients {V'o, V'lj V'2, • • • j V'Af}- 

To date, the TaylorF2 phase has ten known phase coefficients (V'o, ■■■■,'4^7, and two additional 
coefficients and V'e'^ associated with logarithmic contributions). Here we will not use ■0o 
as a variable coefficient; even so, if one were to consider all the subsets of the set of remaining 
coefficients, one would have to take into account 2^ — 1 = 511 ways in which a deviation can 
occur. Apart from this being computationally demanding, we do not expect to be sensitive to 
the highest-order coefficients; hence it makes sense to limit oneself to all the subsets of 

{V'l,'02,---,V'iV3,}, (6) 

where Nt is the number of phase coefficients one chooses to consider. We thus allow one or more 
of the coefficients {i>\,il^2 . ■ ■ ,iPnt} '^o vary freely, instead of following the functional dependence 
on (A^ , r]) as predicted by GR. The choice of Nt will be in part be influenced by the required 
generality of the test, measurability of phase coefficients, and computational limitations. 

Finally, we quantify our belief in whether one or more phase coefficients deviate from GR by 
means of auxiliary hypotheses -ffiiia-.ifc) which are defined as follows: 

Hhi2---ik is hypothesis that the phasing coefficients tpi^^ , . . . , tpi,, do not have the 
functional dependence on (Ai,ri) as predicted by General Relativity, but all other 
coefficients il^j, j ^ {ii,i2i ■ ■ ■ i ik} do have the dependence as in GR. 

It is important to note that by definition of the hypotheses Hi^i^„_i^, they are mutually, logically 
disjoint, i.e., Hi^i^,,,^^ A Hj^j2,„j^ is always false for {ii, i2, . . . , ifc} ^ {ji, j2, • • • , J/}- 

For a signal to be inconsistent with GR, we require that one or more phase coefficients deviate 
from GR. In terms of hypotheses, we are thus interested in the logical 'or' of the sub- hypotheses, 
Hi^i2...if., defined above. With this in hand, we can now define "HmodGR to be: 

^modGR = \/ -f^ni2...ifc- C^) 

i\<i2<-.<ik',k<NT 

The odds ratio for Nt coefficients is given by: 



{Nt) QUiodGK ^ -P(^modGRM, I) ^ h<i2<...<ik;k<NT ^iii2-ik 1^' ^) 

P(nGR\d,l) P{nGR\d,l) 

Using the fact that the auxiliary hypotheses are mutually, logically disjoint, one can write 



(8) 



Nt 

Pi V Hi,i2...iM^)=J2 E PiHni2...iM,l). (9) 

i\<i2<---<ik\k<NT k=l ii<i2<---<ik 



Applying Bayes' theorem, one finds 

Nt 



(Nt) QmodGR ^ \ " \^ ^-"ni 
GR 

k=\i\<i2<..-<ik 



P{Hii i2...ik |I) pjii2---«fe 

gr|I) 



B}i^-'\ (10) 



where 



TJ^l^2■■■^k _ P{d\Hni2...ik^^) /nx 
- P(d|HGR,I) • ^^^^ 



At this point, one has to set the values for the relative prior probabilities, ^^pl^u^^^i)^^ ■ When 
one is devoid of prior information as to which of the test coefficients are inconsistent with GR, 
one can choose to invoke total ignorance and assign to each an equal weight, i.e. 

Despite the choice of total ignorance, however, one more quantity needs to be set. The overall 
relative prior, ^p(^'^^|i'j^^ ; describes the prior belief in whether GR is the correct theory or not. 
The choice of this quantity is left to the reader. For convenience, however, we write 

P(^gk|I) = ^''^ 

As will become apparent below, a will end up being just an overall scaling of the odds ratio. 
Later on, for the purposes of showin g re sults, we will set a = 1. 



The equality (13), together with (12) and the logical disjointness of the 2^^ — 1 hypotheses 
Hi^,„ii^ implies 

p{nGR\i) 2^T-i ^ ) 

In terms of the Hij^i2„,if., the odds ratio can then be written as 

Nt 

k=l ii<i2<---<ik 

Up to an overall prefactor, the odds ratio is thus a straightforward average of the Bayes factors 
from the individual sub-hypotheses, i/jjj2...jj,. 

2.4- Multiple sources 

Although the detection rate for compact binary coalescences is still rather uncertain, we expect 
advanced instruments to detect several events per year [T8]. It is therefore important to take 
advantage of multiple detections to provide tighter constraints on the validity of GR. 

The extension of the odds ratio to include observations from several independent sources can 
be found in |4H I49j . Here we simply state the result, referring the interested reader to these 
papers for details. If one assumes N independent measurements and the events are labelled by 
A, one can write the odds ratio as 

Nt M 
fe=l ii<i2<..-<ik A=l 

where 

(A) pn»2--.ifc _ P(.dA\Pl^iii2..-ik^^) (-<rj\ 
""gK - P(d^|^GR,I) ' ^ ^ 

with dA being the data associated to the ^th detection. 



2.5. Noise 

From a theoretical point of view, the data favours the hypothesis ^modGR compared to the 
hypothesis Hgr when Oq^^^ > 1. The relative degree of belief in the two hypotheses is 
encapsulated in the magnitude of the odds ratio. However, in the case of advanced ground- 
based detectors, the signals will be buried deep inside the noise. This introduces the problem 
that the noise itself can mimic the effect of a deviation from GR that is non-negligible. 

Hence we need to study the effect of noise on the odds ratio. For this purpose, we constructed 
a so-called background, i.e. a distribution of log odds ratios from a large number of catalogues, 
collectively denoted by k, of simulated signals consistent with T^gr and embedded within noise. 
The background distribution P(ln Oq^'^'^^Ik, ?^gRi I) of log odds ratios for catalogues of GR 
sources can be seen as the blue dotted histogram in the right hand panel of Fig. [2] below. 

In the advanced detector era, one will only have access to a single catalogue of detected 
foreground events. The associated measured log odds ratio should subsequently be compared 
with the background distribution in order to quantify our belief in a deviation from GR. To 



do this, in Sec. 3.3 we will introduce a maximum tolerable false alarm probability /3, which 
together with the background distribution sets a threshold InO^ for the measured log odds 
ratio to overcome. 

For specific examples of GR violations, we will want to know how likely it is that the catalogue 
of foreground sources will have an odds ratio that is above threshold. For this reason we will 
also simulate large numbers of foreground catalogues, collectively denoted by k' . For a given 
false alarm rate /3, one can then calculate what fraction of the simulated foreground catalogues 
has a log odds ratio above the associated threshold In O^; this fraction we will call the efficiency. 

2.6. Implementation 

A few remarks have to be made regarding the implementation of the aforementioned method. 
First, we use Nt = 3 testing parameters {V'l, ^^2, V's}- The varying of these phase coefficients 
was parameterised in the following fashion: 

i:i = ^f\M,l^)[l + 5x^\, (18) 

with ^p^(A^, r/) the functional form of the dependence of V'j on ry) according to GR, and the 
dimensionless 5xi is a fractional shift in ipi. The 0.5PN case, ipi, however cannot be implemented 
in a similar way, as GR predicts ij)^^ = 0. Instead, deviations from ij)^^ are modelled as 

V'p^(A^,7?) = ^ ^{^Mr'h'^'^Sxi, (19) 
12877 

and the interpretation of a fractional shift is not adequate; rather, 5xi is related to the magnitude 
of the deviation itself. 



For the computation of the odds ratio defined in Eq. ( 15 ) and Eq. ( 16 ), one needs to compute 
the relevant Bayes factors via the evidences. In high-dimensional problems, brute force methods 
to calculate the integral in Eq. ([3]) are computationally too expensive. One can, however, make 
use of more efficient methods to make this calculation computationally feasible. In this paper, 
we resort to an algorithm called Nested Sampling [59]. More specifically, an implementation 
tailored to ground-based observations of coalescing binaries by Veitch and Vecchio [601 161] |62] 
was used. 

Both the model waveforms and the Nested Sampling algorithm were appropriately adapted 
from existing code in the LIGO Algorithms Library |56| . 



3. Results 

In this section, we want to lend further support to the claim in |39] that the method is in 
principle sensitive to deviations that are not considered within the model waveforms, as long as 



the phase shift at / ~ 150 Hz, where the detectors are the most sensitive, is comparable to, say, 
a shift of 6x3 ~ (a few) x 10"^ at 1.5PN order (corresponding to a shift in the overall phase of 
~ 5 radians at the given frequency) . To this end we use two heuristic examples where the change 
in phase of the signals cannot be accomodated by the model waveforms, yet the deviation from 
GR turns out to be detectable. 

The first example, in subsection |3.1[ considers an additional term in the phase associated with 
a power of frequency which itself depends on the total mass of the system. This power is chosen 
in such a way that within the range of total masses we consider, the frequency dependence of 
the anomalous contribution varies from effectively being 0.5PN at the lower end to 1.5PN at the 
higher end. Clearly, our model waveforms are in no way designed to capture such a deviation 



from GR. The second case, in subsection 3.2, considers a deviation at a FN order (2PN) that 
is higher than the orders at which we allow phase coefficients to vary in our model waveforms 
(0.5PN, IPN, and 1.5PN). 

After presenting the main results, we study the effects of the number of detected sources on 
our confidence in a deviation from GR. For this investigation we use the example in subsection 



Prom Fisher matrix analyses, it has been shown that the phase coefficients in Eq. ^ are best 
measured as the total mass of the system goes down |46l HZl 08] . Therefore, the signals were 
chosen to originate from neutron stars with masses between 1 Mq and 2 Mq. For such systems, 
it has been shown that contributions from the spin interactions and the sub-dominant signal 
harmonics are negligible, and merger /ringdown do not have a significant impact \36\ 1631 16^. 

The aim is to simulate the situation at Advanced Virgo and LIGO as realistically as possible. 
We have assumed an advanced detector network with detectors at Hanford and Livingston, 
both with the Advanced LIGO noise curve |65j . and a detector at Cascina with the Advanced 
Virgo noise curve Three data streams were produced, containing stationary, Gaussian 

noise coloured by these respective noise curves, to which simulated signals were added. Events 
were placed uniformly in volume {i.e. probability density proportional to D^, where D is the 
luminosity distance), between 100 Mpc and 400 Mpc, to reflect the estimates of the number of 
detectable sources and the appropriate horizon distance. A lower cut-off of 8 was imposed on 
the network SNR, defined as the quadrature sum of the individual detector SNRs, so as to be 
consistent with the LIGO/ Virgo minimum for an event to be claimed as detected. 

The waveforms were chosen to go up to 2PN in phase both for the injected signals and the 
model waveforms. The test coefficients were taken to be V'l; V'2 and ip^, so that the hypothesis 
"HmodGR contains 2^^ — 1 = 7 logically disjoint sub-hypotheses. 

The priors given to the deviations 6xi were chosen to be flat and centered around zero, with 
a total width of 0.5. The priors on the remaining parameters were taken to be the same as in 
[60] . with the exception that the distance is allowed to go up to 1000 Mpc. 

It should be stressed that the choice of waveform approximant, test coefficients, and priors 
on the deviations were, to a large extent, arbitrary. In the advanced detector era, one would 
seek to perform the most general test that computational resources will allow. This will include 
the most accurate waveforms available at that time, the highest number of test coefficients one 
can handle, and the least restrictive priors that are in accordance with our prior information at 
that moment. 

3.1. A deviation with a mass dependent power of frequency 

In our first example, the signals are given a deviation in the phase that has a mass dependent 
frequency power. Specifically, the deviation is of the form: 



(20) 



where M denotes the total mass of the binary system. We note that for a system with component 
masses in the middle of our range, (1.5, 1.5) Mq, the change in phase at / = 150 Hz is about the 
same as for a 10% shift in tp^. More precisely, for these masses the change in ^'(TW, r/; 150 Hz) is 
13.3 radians, to be compared with the 12.8 radians change induced by a constant 10% shift in 

In order to assess the statistics of the odds ratio, a large number or signals were simulated, 
with the parameter distribution explained above. For each of the signals, we calculated the odds 



ratio as defined in Eq. (15). The distribution of the odds ratio as a function of SNR can be 
seen in Fig. [TJ The separation between 'foreground' and 'background' is more or less complete 
already below SNR ~ 15. 
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Figure 1. Distribution of log odds ratios as a function of the optimal network SNR. The 
crosses are for sources whose GW emission is in accordance with GR, while the circles are for 



sources with a deviation as in Eq. (20). For the GR sources, the log odds ratios are concentrated 
tightly around zero, while for sources with the deviation in phase, they increase as a function of 
SNR. This is in agreement with the expectation that parameter estimation improves as the SNR 
increases. As our parameter estimation becomes better, deviations become more pronounced, 
which increases our confidence in a deviation from GR. 



In the top panel of Fig. [2| the odds ratios for the sources with a deviation from GR are 
compared with a 'background' distribution (in the sense defined above). Next we collected 



sources into 'catalogues' of 15 sources each and computed the combined odds ratio of Eq. (16) 
for all of these catalogues; the distribution of these odds ratios for 'background' and 'foreground' 
are shown in the bottom panel of Fig. [2j Clearly, the ability to combine information from 
multiple sources is a powerful tool in increasing one's confidence in a violation of GR. For the 
given deviation, a violation of GR can be established with near-certainty. 



3.2. A deviation at a higher PN order than the testing coefficients 

Our testing coefficients are Vii ^^2) V'3) so that the model waveforms can only have shifts in 
PN phase contributions up to 1.5PN order. To show that we can nevertheless be sensitive to 
anomalies at higher PN order, we now consider signals with a constant shift at 2PN. We note 
in passing that theories with quadratic curvature terms in the action tend to introduce extra 
contributions at 2PN [66l [671 l6H] . 
Thus, we consider injections with 



■04 



V^4«R(A1,??) [1 + 5x4 



(21) 



where the magnitude is set to be 5x4 = 0-2- For comparison, at / = 150 Hz and for a system 
with component masses (1.5, 1.5) Mq, the change in the phase caused by such a deviation is 
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Figure 2. Left: Normalised distributions of log odds ratios for individual sources, where 
the signals are in accordance with "Hgr (blue dotted) or have a shift with a mass dependent 
frequency behaviour given in Eq. (20) (red striped). Right: Normalised distributions of logs of 
the combined odds ratios for the same injections as at the top, but collected into independent 
catalogues of 15 sources each. The effect of combining sources is to separate the distribution of 
GR injections and anomalous injections, increasing one's confidence in a deviation from GR. 



comparable to the one caused by a negative relative shift in -03 of 3.5% (namely, the shift in 
^{M,r],f) at / = 150 Hz is ~ 4.5 radians). 

Fig. [3] shows the odds ratio as a function of the optimal SNR, both for GR injections and 
anomalous ones. This time, as opposed to the example considered in subsection 3.1 separation 



between the odds ratios of signals with the deformations of Eq. (21) and the noise induced 
distribution of odds ratios for GR injections becomes apparent at SNR ~ 20. This can be 
attributed the fact that the deviations are in more subdominant contributions to the phase 
compared to the case considered earlier. 
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Figure 3. The distribution of odds ratio as a function of the optimal network SNR for GR 



injections (crosses) and injections with deviations from GR as in Eq. (21) (circles). For the 
anomalous injections, the odds ratio increases as a function of SNR. 



In Fig. [4] we show the odds ratio for individual sources and for random catalogues with 15 
sources each. For individual sources, the separation between the background and the foreground 
is present but weak. However, when one assumes random catalogues of 15 sources each, 
the separation becomes very significant. This further illustrates the importance of combining 
information from multiple sources. 




Figure 4. Left: normalized distribution of log odds ratios for individual sources, where the 
signals are in accordance with Hgr (blue dotted) or have a deviation of the form given in 
Eq. (21) (red striped). Right: normalised distribution of logs of the combined odds ratios for 
the same signals as at the top, but randomly arranged in catalogues of 15 sources each. The effect 
of combining sources is in this case is profound. Only a small difference between background 
and foreground is visible when considering individual sources. For catalogues of 15 sources, the 
differentiation becomes significant. 



3.3. Effect of catalogue size 

We have seen in the previous section that constructing the odds ratio from a catalogue of sources 
greatly increases the confidence in a deviation from GR. However, the rates of binary inspiral 
observed in the advanced detector era are highly uncertain. It is therefore instructive to study 
the effect of the catalogue size on our confidence in detecting a deviation. 

To characterise such a confidence, we introduce the concept of efficiency. Assume one 
has two distributions of log odds ratios: The background distribution of log odds ratio 
obtained when the simulated catalogues, collectively denoted by k, are in agreement with TicR, 
P{ln Oq^'^^Ik, T-Lgr, I)^ and the foreground distribution obtained when the simulated catalogues 
k' adhere to some alternative theory, P(ln Oq^^^\k' , 'Hs.it, I)- Now choose a maximum tolerable 
false alarm probability 13. This sets a threshold InO^ for the measured log odds ratio to overcome, 
as follows: 

13= P{lnO\K,nGR,l)dlnO . (22) 

We now define the efficiency, C, as the fraction of foreground with a false alarm probability of 
f3 or less, i.e. the portion that lies above the threshold InO^: 

/•OD 

C= / PilnO\K','H^n,l)dlnO. (23) 

J In Op 

The efficiency can be viewed as the chance that if there is a deviation from GR corresponding to 
T-la.it, the catalogue of sources that is actually detected will have a log odds ratio above threshold, 
i.e. that it will have a false alarm probability of (3 or less. Note that with these definitions, the 



efficiency is independent of the overall prior odds ratio a in Eqs. (13) and (16), as it corresponds 
to the same shift of In a in the background distribution, the threshold In Oj3, and the foreground 
distribution. 



In Fig. [5} we show the efficiency C for the example shown in subsection 3.2 as a function of 



the catalogue size, for (3 G {0.32, 0.05, 0.01}. Which sources are placed together in a catalogue is 
determined randomly. To understand the statistical fluctuations in the efficiency when collecting 
sources into catalogues in different ways, for the same set of signals we considered 5000 random 



orderings in which the signals are combined into catalogues. The resulting median and the 68% 
confidence levels are shown as the central curve and the error bars, respectively. 




Figure 5. The efficiency at a fixed false alarm rate as function of the catalogue size for the 
example described in subsection |3.2| ((5x4 = 0.2). 5000 random orderings of the same set of 
sources were split into catalogues. The mean (central curve) and the 68% confidence intervals 
(error bars) are plotted. The efficiency rises sharply as a function of the number of sources, 
underscoring the importance of coherently considering all detected signals events. 



As can be seen in Fig. [5} the acceptance probability rises sharply as a function of the catalogue 
size. This underscores the importance of considering all the detected source in a coherent fashion, 
as was explained in subsection |2.4[ Even though a single detection might not yield confidence in 
a deviation from GR, coherently adding information from multiple sources can rapidly increase 
this confidence. 

To put the numbers in Fig. [S] into perspective, the predicted rate for binary inspiral in the 
so-called 'realistic' case is 40 per year [18] . 



4. Conclusions and discussion 

We have given two striking examples to support the claim that our method proposed in |49| 
can distinguish deviations that are not captured by the limited model waveforms, as long as the 
phase shift in the frequency range where the detectors are the most sensitive (/ ~ 150 Hz) is 
comparable to one caused by a shift of at least 5x3 ~ (a few) x 10~^, i.e., ~ 5 radians. 

In the first example, signals were studied that have a deviation in the phase with a mass 
dependent power of frequency, effectively ranging from 0.5PN to 1.5PN as the total mass is 
varied from the lowest to the highest value we consider. The magnitude of the effect was such 
that at / = 150 Hz, the change in phase (~ 13 radians) was about the same as that induced 
by a constant relative shift 6x3 = —0.1. The odds ratio for individual sources already showed 
confidence that such deviations can be measured. When sources were combined into catalogues 
of 15 each, the confidence in having detected a deviation improved drastically, and a deviation 
of this kind will be measurable with a false alarm probability of essentially zero. 

We further showed results for signals with a shift in the 2PN phase coefficient, ■04. Setting 
5x4 = 0.2, the induced change in phase at / = 150 Hz is comparable to a constant shift 
6x3 = —0.035 at 1.5PN, namely 4.5 radians. The choice of a modification at 2PN was inspired 
by corrections to the phase if one considers a modified Einstein-Hilbert action containing terms 
that are quadratic in the Riemann tensor, as calculated in [66lE3l68]. As can be seen from Fig.jsj 
the efficiency for a maximum false alarm probability of 1% is essentially unity for catalogues 
comprising more than 25 sources. 



Lastly, we investigated the effect of the catalogue size on our confidence in detecting a 
deviation. In general, this confidence rises sharply with the number of sources in the catalogue, 
underscoring the necessity to combine information from multiple sources in the advanced detector 
era. 

Finally, we want to mention some necessary future developments. First and foremost, the 
most accurate waveforms will need to be incorporated in order to distinguish between genuine 
effects predicted by GR, and possible deviations from GR. Especially for systems consisting 
of two black holes, or a neutron star and a black hole, these waveforms will need to include 
dynamical spins, sub-dominant signal harmonics, residual eccentricity, a description of the 
merger and ringdown, etc. The development of waveforms including these effects is ongoing 
[MlEQlinilZSllTSlElllZSllZSllZZllISlI^- Furthermore, the effects of reahstic detector noise, 
instead of idealised stationary, Gaussian noise, need to be studied in detail. 

Once the advanced detectors have reached their design sensitivities and a number of detections 
have been made {e.g. using template-based searches [80]), a test of General Relativity using 
compact binary coalescences could go as follows. Starting from the best available GR waveforms, 
one introduces parameterised deformations in phase as well as in amplitude, leading to disjoint 
hypotheses -ffjiij. logical 'or' of all of which is ^modGR- Next, many injections are 
performed of GR waveforms into real or realistic data and collected into 'catalogues' to establish 
a background distribution for the log odds ratio In Oq^^^, and a suitable threshold In is set 
below which a deviation from GR will not be accepted. Then InOQ^^'^'^ is computed for the 
catalogue of sources that were actually detected. If this number is above threshold, a violation 
of GR is likely. 

The number of testing parameters one can consider will be limited, mainly by the 
computational restrictions one will have in the advanced detector era. Our method is meant, 
first and foremost, to establish whether or not a violation of GR is plausible, of whichever kind 
and not mainly to pinpoint the eventual alternative theory of gravity responsible for the GW 
signal, nor to estimate the parameters of the alternative model, as it is unlikely that low-SNR 
signals as those expected for the advanced stage of LIGO/ Virgo will enable a detection of a GR 
deviation and an identification of its nature. However, once a deviation is found, a follow-up 
investigation can be performed with our inference method in an attempt to find out its precise 
nature by trying different alternatives to GR, i. e. using waveforms inspired by specific (families 
of) alternative theories of gravity. A version of the so-called parameterised post-Einsteinian 
waveform family \43\ H2] could be useful in this respect. In this regard we recall that our 
framework is not tied to any particular waveform family. 

The results of [49j, and the further investigations presented here, motivate the construction 
of a full data analysis pipeline based on the method we have presented. Although much work 
remains to be done on the data analysis side, the advanced detectors will enable us to go well 
beyond the tests performed using the observed binary pulsars, and give us our very first empirical 
access to the genuinely strong- field dynamics of space-time. 
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